Learning Cross-lingual Representations with Matrix Factorization

نویسندگان

Hanan Aldarmaki

Mona Diab

چکیده

We present a matrix factorization model for learning cross-lingual representations. Using sentence-aligned corpora, the proposed model learns distributed representations by factoring the given data into language-dependent factors and one shared factor. Moreover, the model can quickly learn shared representations for more than two languages without undermining the quality of the monolingual components. The model achieves an accuracy of 88% on English to German cross-lingual document classification, and 0.8 Pearson correlation on Spanish-English cross-lingual semantic textual similarity. While the results do not beat state-of-the-art performance in these tasks, we show that the crosslingual models are at least as good as their monolingual counterparts.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

A joint-space model for cross-lingual distributed representations generalizes language-invariant semantic features. In this paper, we present a matrix cofactorization framework for learning cross-lingual word embeddings. We explicitly define monolingual training objectives in the form of matrix decomposition, and induce cross-lingual constraints for simultaneously factorizing monolingual matric...

متن کامل

GWU NLP at SemEval-2016 Shared Task 1: Matrix Factorization for Crosslingual STS

We present a matrix factorization model for learning cross-lingual representations for sentences. Using sentence-aligned corpora, the proposed model learns distributed representations by factoring the given data into language-dependent factors and one shared factor. As a result, input sentences from both languages can be mapped into fixed-length vectors and then compared directly using the cosi...

متن کامل

A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

Cross-lingual sentiment classification aims to automatically predict sentiment polarity (e.g., positive or negative) of data in a label-scarce target language by exploiting labeled data from a label-rich language. The fundamental challenge of cross-lingual learning stems from a lack of overlap between the feature spaces of the source language data and that of the target language data. To addres...

متن کامل

Annotation Projection-based Representation Learning for Cross-lingual Dependency Parsing

Cross-lingual dependency parsing aims to train a dependency parser for an annotation-scarce target language by exploiting annotated training data from an annotation-rich source language, which is of great importance in the field of natural language processing. In this paper, we propose to address cross-lingual dependency parsing by inducing latent crosslingual data representations via matrix co...

متن کامل

Limitations of Cross-Lingual Learning from Image Search

Cross-lingual representation learning is an important step in making NLP scale to all the world’s languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whet...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Learning Cross-lingual Representations with Matrix Factorization

نویسندگان

چکیده

منابع مشابه

Learning Cross-lingual Word Embeddings via Matrix Co-factorization

GWU NLP at SemEval-2016 Shared Task 1: Matrix Factorization for Crosslingual STS

A Subspace Learning Framework for Cross-Lingual Sentiment Classification with Partial Parallel Data

Annotation Projection-based Representation Learning for Cross-lingual Dependency Parsing

Limitations of Cross-Lingual Learning from Image Search

عنوان ژورنال:

اشتراک گذاری